Estimation of speech acoustics from visual speech features: A comparison of linear and non-linear models

نویسندگان

  • Jon Barker
  • Frédéric Berthommier
چکیده

This paper examines the degree of correlation between lip and jaw con guration and speech acoustics. The lip and jaw positions are characterised by a system of measurements taken from video images of the speaker's face and pro le, and the acoustics are represented using line spectral pair parameters and a measure of RMS energy. A correlation is found between the measured acoustic parameters and a linear estimate of the acoustics recovered from the visual data. This correlation exists despite the simplicity of the mapping and is in rough agreement with correlations measured in earlier work by Yehia et al. The linear estimates are also compared to estimates made using nonlinear models. In particular it is shown that although performance of the two models is remarkably similar for static visual features, non-linear models are better able to handle dynamic features.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comparison of acoustic coding models for speech-driven facial animation

This article presents a thorough experimental comparison of several acoustic modeling techniques by their ability to capture information related to orofacial motion. These models include (1) Linear Predictive Coding and Linear Spectral Frequencies, which model the dynamics of the speech production system, (2) Mel Frequency Cepstral Coefficients and Perceptual Critical Feature Bands, which encod...

متن کامل

Utilizing Kernel Adaptive Filters for Speech Enhancement within the ALE Framework

Performance of the linear models, widely used within the framework of adaptive line enhancement (ALE), deteriorates dramatically in the presence of non-Gaussian noises. On the other hand, adaptive implementation of nonlinear models, e.g. the Volterra filters, suffers from the severe problems of large number of parameters and slow convergence. Nonetheless, kernel methods are emerging solutions t...

متن کامل

Classification of emotional speech using spectral pattern features

Speech Emotion Recognition (SER) is a new and challenging research area with a wide range of applications in man-machine interactions. The aim of a SER system is to recognize human emotion by analyzing the acoustics of speech sound. In this study, we propose Spectral Pattern features (SPs) and Harmonic Energy features (HEs) for emotion recognition. These features extracted from the spectrogram ...

متن کامل

Measuring the relation between speech acoustics and 2D facial motion

This paper presents a quantitative analysis of the relation between speech acoustics and the 2D video signal of the facial motion that occurs simultaneously. 2D facial motion is acquired using an ordinary video camera: after digitizing a video sequence, a search algorithm is used for tracking markers painted on the speaker’s face. Facial motion is represented by the 2D marker trajectories; wher...

متن کامل

An Investigation of Dual Task Effect on The Severity of Stuttering in School-Age Children

Objective: Stuttering is a speech disorder that occurs with frequent and abnormal disruptions in speech, such as sound repetition, sound prolongation, and sound or airflow blockage. Although various hypotheses and factors have been introduced including cognitive and linguistic factors, the etiology of stuttering has not been fully understood. According to the vicious circle hypothesis, increase...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999